Efficient Extended Boolean Retrieval
ثبت نشده
چکیده
Extended Boolean retrieval (EBR) models were proposed nearly three decades ago, but have had little practical impact, despite their significant advantages compared to either ranked keyword or pure Boolean retrieval. In particular, EBR models produce meaningful rankings; their query model allows the representation of complex concepts in an and–or format; and they are scrutable, in that the score assigned to a document depends solely on the content of that document, unaffected by any collection statistics or other external factors. These characteristics make EBR models attractive in domains typified by medical and legal searching, where the emphasis is on iterative development of reproducible complex queries of dozens or even hundreds of terms. However, EBR is much more computationally expensive than the alternatives. We consider the implementation of the p-norm approach to EBR, and demonstrate that ideas used in the max-score and wand exact optimization techniques for ranked keyword retrieval can be adapted to allow selective bypass of documents via a low-cost screening process for this and similar retrieval models. We also propose term independent bounds that are able to further reduce the number of score calculations for short, simple queries under the extended Boolean retrieval model. Together, these methods yield an overall saving of 50% to 80% of the evaluation cost on test queries drawn from biomedical search.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملBoolean and ranked information retrieval for biomedical systematic reviewing
Evidence-based medicine seeks to base clinical decisions on the best currently available scientific evidence and is becoming accepted practice. A key role is played by systematic reviews, which synthesize the biomedical literature and rely on different information retrieval methods to identify a comprehensive set of relevant studies. With Boolean retrieval, the primary retrieval method in this ...
متن کاملRanking Documents in Thesaurus-Based Boolean Retrieval Systems
In this paper we investigate document ranking methods in thesaurus-based boolean retrieval systems, and propose a new thesaurus-based ranking algorithm called the Extended Relevance (E-Relevance) algorithm. The E-Relevance algorithm integrates the extended boolean model and the thesaurus-based relevance algorithm. Since the E-Relevance algorithm has all the desirable properties of the extended ...
متن کاملAdaptive Feedback Methods in an Extended Boolean Model
Relevance feedback methods have been used in information retrieval to generate improved query formulations based on information contained in previously retrieved documents. The relevance feedback techniques have been applied to extended Boolean query formulations as well as to vector query formulations. In this paper, we propose an adaptive way to improve the retrieval performance in an extende...
متن کامل